Multiple Imputation for Disclosure Limitation: Future Research Challenges
نویسنده
چکیده
Statistical agencies that disseminate data to the public are ethically and often legally required to protect the confidentiality of respondents’ identities and sensitive attributes. To satisfy these requirements, Rubin (1993), Little (1993), and Fienberg (1994) proposed that agencies utilize multiple imputation. For example, agencies can release the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. Multiple imputation for protecting confidentiality is often called the synthetic data approach.
منابع مشابه
A multiple imputation approach to disclosure limitation for high-age individuals in longitudinal studies.
Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because o...
متن کاملDistribution-preserving statistical disclosure limitation
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con dential data replaced by multiply-imputed synthetic values. A mis-speci ed imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate th...
متن کاملDistribution-Preserving Statistical Disclosure Limitation1
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con dential data replaced by multiply-imputed synthetic values. A mis-speci ed imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of s...
متن کاملMultiple imputation: an alternative to top coding for statistical disclosure control
Top coding of extreme values of variables like income is a common method of statistical disclosure control, but it creates problems for the data analyst. The paper proposes two alternative methods to top coding for statistical disclosure control that are based on multiple imputation. We show in simulation studies that the multiple-imputation methods provide better inferences of the publicly rel...
متن کاملUsing Multiple Imputation to Integrate and Disseminate Confidential Microdata
In data integration contexts, two statistical agencies seek to merge their separate databases in one file. The agencies also may seek to disseminate data to the public based on the integrated file. These goals may be complicated by the agencies’ need to protect the confidentiality of database subjects, which could be at risk during the integration or dissemination stage. This article proposes s...
متن کامل